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1. General comments. The interesting paper of Davies and Gather (hence- 
forth [DG]) pulls together results on upper bounds on the breakdown value of 
translation equivariant location estimators [Donoho (1982)], regression esti- 
mators [Rousseeuw (1984)] and affine equivariant scatter estimators [Davies 
(1987)] into a single framework of group equivariance. I can only agree with 
them on the important role of the latter notion in obtaining nontrivial 
bounds. [By the way, I prefer the term breakdown value myself because 
it is not a point, and the term "value" captures both its dimension (one) 
and its orientation (we aim for higher, not lower values).] 

The theory in [DG] is formulated for estimators that are uniquely defined, 
but it seems to work just as well in the general case. Then T{P) is a set, and 
we can follow the implicit convention of saying that it breaks down when 
any member of T(P) does. We only need to redefine D(T{P),T(Q)) in (2.4) 
as a supremum over all pairs of members of T(P) and T(Q). 

The new applications of the theory are fascinating, for example, to the 
Michaelis-Menten model (with a nontrivial bound) and logistic regression 
(without one). I am less convinced by the fragility argument illustrated by 
the difference between the contaminated samples (6.2) and (6.3). It is true 
that this proof of the upper bound only covers (6.2), but in some sense that is 
enough since breakdown is a worst-case concept and the bound is not specific 
for the median but for all translation estimators. But anyway, the behavior 
of the median at (6.3) can be derived from that at (6.2) by a variety of other 
properties that it possesses. For instance, its monotonicity property alone 
suffices. Or we can use the property that when you move the observations 
over distances of at most 5, the median changes by at most 5. This holds for 
any 5 > 0, and is a Lipschitz property for the metric on samples defined as 

(1.1) d((x 1 ,...,x n ),(y 1 ,...,y n )) = min max|xj - y n{i) |, 
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where S n is the set of all permutations on {l,...,n}. Note that (1.1) is 
equal to maxj|xj :n — yi :n \ [see Rousseeuw and Leroy (1987), pages 127- 
128]. People who compute maxbias curves always use the properties of the 
actual estimator. Perhaps we should not expect much more elegant results 
for contaminated samples that break down the estimator than for those that 
yield a finite bias. 

2. The maximal breakdown value of affine equivariant location estima- 
tors. From here on I will focus on the open problem in Section 5.2 of [DG]. 
It has been known since Donoho (1982) that the finite-sample breakdown 
value (fsbv) of translation equivariant estimators of location is at most 
L(n + l)/2j/n and that this bound is sharp. The bound obviously holds 
also for affine equivariant location estimators, but it may not be sharp 
for them. In one dimension {k = 1) it is, but for k > 2 this has been an 
open problem for over 20 years. During that time many affine location es- 
timators were constructed with an fsbv of [(n — k + l)/2\/n, such as the 
MVE and MCD of Rousseeuw (1984), location S-estimators [Davies (1987), 
Rousseeuw and Leroy (1987)] and a modification of the Stahel-Donoho esti- 
mator [Tyler (1994), Gather and Hilker (1997)]. Since [(n - k + l)/2j /n is 
known to be the sharp upper bound for affine equivariant scatter estimators 
[Davies (1987)], it has seemed plausible that it could also be the upper bound 
for affine location. Over the years there have been several attempts to attain 
the upper bound [(n + l)/2j/n, but as far as I know none has succeeded. 
[The result in Zuo (2004) does not count because it uses a weaker version of 
the fsbv which requires that all the contaminating points coincide.] 

Let us consider any data set X = {xi, . . . ,x n } C R fc (from here on al- 
ways k > 2 and n> k) which is in general position (GP). By GP we mean 
that no more than k data points lie on any affine hyperplane. This holds 
a.s. when sampling from an absolutely continuous distribution. The con- 
vex hull conv(X) is then a polytope with faces that contain exactly k data 
points. [In R 3 the faces are two-dimensional, and in general they are (k — 1)- 
dimensional.] Note that conv(X) can be stretched arbitrarily by replacing 
even a single point of X by an outlier. Since we are studying very robust 
estimators T, it is natural to require that T should not lie on the boundary 
of conv(X) or outside of conv(X). A slightly more general formulation of 
this requirement is the following condition: 

(Cfi) Let X = {x\, . . . , Xn} C R fe be in general position, with n > k > 2. 
Let u be a direction such that the inner products y% = u'xi satisfy 
Hi = ■ • ■ = Hh < Vh+i < ■ • ■ < Vn (after renumbering) for the specified 
number h, with 1 < h < k. Then there exists an a > (which depends 
only on k and the yi,...,y n ) such that u'T(X) >yh + ot. 
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The typical case is to take h = k. For any face of conv(X) we can take the 
orthogonal direction u pointing to the inside of conv(X), so Condition (Cfc) 
says that T cannot lie on or arbitrarily close to the boundary of conv(X) 
or outside of it. [Note that conv(X) is the intersection of halfspaces con- 
taining X and having a face of conv(X) on their boundary] For h < k the 
condition becomes somewhat weaker; for example, Condition (Ci) only says 
that T cannot come arbitrarily close to a vertex of conv(X) or lie outside 
of conv(X). 

Condition (C^) is intuitive for a robust estimator. For instance, Condi- 
tion (Cfc) holds for all estimators that can be written as a weighted mean 
(I2i' w i x i) I (J2i w i) where < Wi < 1 and at least k + 1 of the Wi equal 1 [it 
suffices to put a = {uk+i — Uk)/ n \- This encompasses, for example, trimmed 
means and the minimum covariance determinant estimator (MCD). More- 
over, a robust estimator would typically be expected to have a reasonably 
large Tukey depth, for example, 

(2.2) depth(T, X) > k + 1 

(at least for large enough n, when there are many depth contours). Condi- 
tion (2.2) implies Condition (Cfc) and is another way of saying that T should 
not be in the outskirts of the data cloud. 

Theorem 1. Consider a data set X = {x\, . . . ,x n } C R fe in general po- 
sition with n> k. Let T be an affine equivariant location estimator satisfying 
Condition (C h ) with l<h<k. Then fsbv(T, X) < [(n - h + l)/2j/n. 

Proof. Put § := T(X) £ R fc . Since X is in GP, conv(X U {§}) has at 
least one face not containing 6. Take an /i-subset S of the k data points on 
this face. Then there exists an affine hyperplane L which contains S such 
that both 6 and X \ S lie strictly on the same side of L. Assume w.l.o.g. 
that £ L. Denote the unit normal vector to L in the direction of X \ S as 
ex and take an orthonormal basis {e2, . • . , efc} of L. After renumbering, the 
x a := e[xi satisfy = x^i = ■■■ = x hjl < Xh+i,i < < x n> i, hence 9\ > a > 
by Condition (Ch)- 

Let us assume that T cannot be broken down by replacing any m-subset 
B oi X, where m = \_(n — h + l)/2j , by an arbitrary m-set B' yielding the 
contaminated data set X' := (X \B)U B' . This means that there exists a 
finite radius M such that for any contaminated data set X' of this type it 
holds that T(X') £ B0, M) := {x £ R fc ; \\x - §\\ < Af }. 

We will now construct a linear transformation which leaves S invariant 
and moves X \ S as well as 6. For this we consider the "shear transform" g 7 
given by the nonsingular matrix 
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relative to the basis {e\, . . . , e^}, for 7 G R. We note that g 7 (ej) = e-,- for 
all j 7^ 1, hence g 7 (xi) = Xi for i = 1, . . . ,h, but at the same time <? 7 (ei) = 
d + 7 e 2 . Denoting = (0 l5 . „ , 4) T , we find g 7 (§) = (hA+lkA, h) T 
with 61 > 0, hence ||g 7 (#) — 0\\ = | 7 |#i goes to infinity for increasing 7. Anal- 
ogously, the image of any data point Xi with i = h + 1, . . . , n is of the form 
g~/(xi) = Xi + "fXne2, so all g 7 (xi) move in the direction of e 2 and (g 7 (xi))\ = 
xn. Each point travels a distance ||<7 7 (xj) — Xi\\ = Ml^iil > M |^fc+i,i| • 

Let us partition X\S into two sets A and -B with \B\ =m = \_(n — h + l)/2j 
and |A| = n — h— \B\. (If n — h is even, we find \A\ = \B\, whereas for odd 
n — h we have \A\ = \B\ — 1.) We will replace B by £> 7 := g^(B) yielding the 
contaminated data set X!y := S U A U i? 7 . Note that X^, is in GP for all but 
a finite number of 7 values. Put T = {7; X'^ is in GP}. For all 7 G T it holds 
that T(X!y) £H:={ze R fc ; z\ > a} by Condition (C h ). 

For any 7 the image of B(9,M) through g 7 is an ellipsoid with center 
g y (6). For a large enough 7 G T it holds that B(6, M) n g^(B0, M)) nH = 0. 
We know that T(X' 1 ) G B(6,M) by assumption. On the other hand, we can 
also write X' y = g 7 (S\J A_ 7 U B), which implies T(X^) G g 7 (B(0,M)). Since 

T(X^) G H it follows that T(X^) G B0,M) n g 1 (B(6, M)) nH = 0. This 
contradiction proves the desired upper bound on fsbv. □ 

In the typical case where h = k, Theorem 1 yields the upper bound [(n — 
k + l)/2j/n which has been attained. This says that any affine location 
estimator T with a higher fsbv must be somewhat strange in the sense of 
not satisfying Condition (Cfc), so T can be arbitrarily close to the boundary 
of conv(X) or even lie outside it. Any T which were to attain the translation 
equivariant bound [(n + l)/2\/n cannot even satisfy Condition (Ci), so at 
times it must be arbitrarily close to a vertex of conv(X) or lie outside it. It 
is counterintuitive that an estimator with maximal fsbv would have such a 
low Tukey depth (at most 1). 

So far the only published result with higher fsbv than \_(n — k + 1) /2j jn is 
the projection median (PM) of Zuo (2003), which attains \_(n — k + 2)/2j/n 
by using a univariate scale estimator MADu in its definition. By The- 
orem 1, this estimator cannot satisfy Condition (Cfc). Here is a bivariate 
counterexample (which can be extended to R fc ). Start with the data points 
z\ = (0,6) and Z2 = (0,-5) for some 5 > 0. Add m points (xi,yi) with X{ 
equispaced between 10 and 20 and Di = xi + 5ui where the noise itj is such 
that these points are in GP. Add another m points with the same X{ but 
with —yi- Then the n = 2m + 2 points of Z are in GP for all but finitely 
many 5. When 5^0, the outlyingness Out(0,0) tends to the outlyingness 
of relative to {0,0, xi,x±, . . . ,x m , x m }; hence for any < 5 < 1 we have 
Out(0, 0) < M for some M < 00. We will prove that for any e > there is a 
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<5o > such that 5 < 5o implies ||PM(Z)|| < e. By projecting in the direction 
orthogonal to y = —x we see that MADi tends to 0, so for small enough 5 
all points (not necessarily data points) in R 2 lying farther than e/y/2 away 
from the line y = —x have Out > M. The same holds for points farther than 
e/\/2 from y = x. Therefore PM — > (0, 0); hence a in Condition (Cfc) is zero. 

Note that Theorem 1 fits in the framework of [DG] with G the affine group 
on R fe . The main difference is that here we first fix a set B (our /i-subset) 
and then a subgroup of G which keeps B invariant, whereas condition (3.3) 
in [DG] is over many possible B. Afterward we put g := g± [i.e., (2.3) with 
7 = 1], yielding A(P n ) = h/n. The remainder of the proof of Theorem 3.1 
in [DG] can then be retraced by noting that for any integer m it holds that 
9 m = 9m (the shear transform with 7 = m). We basically set aside h points 
and then apply our usual reasoning to the remaining n — h points. 

Also note that Condition (C/J and Theorem 1 can be extended to situa- 
tions without general position. As long as T satisfies Condition (C/J without 
the GP condition (this is a stronger assumption), and X does have h points 
whose inner products with some u satisfy yi = ■ ■ ■ = yn < Uh+i < ■ • ■ < Vm 
the upper bound isbv(T,X) < [in — h + l)/2\/n holds. In this situation it 
is even allowed that h > k (which could not happen under GP). 
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